MATRIX INVERSION IS AS EASY AS EXPONENTIATION 
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Abstract. We prove that the inverse of a positive-definite matrix can be approximated by a 
weighted-sum of a small number of matrix exponentials. Combining this with a previous result [6], we 
establish an equivalence between matrix inversion and exponentiation up to polylogarithmic factors. 
In particular, this connection justifies the use of Laplacian solvers for designing fast semi-definite 
programming based algorithms for certain graph problems. The proof relies on the Euler-Maclaurin 
, formula and certain bounds derived from the Riemann zeta function. 
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>v 1. Matrix Inversion vs. Exponentiation 

a ■ a f < 

Given a symmetric n x n matrix A, its matrix exponential is denned to be e A = X^i>o TP This 



operator is of fundamental interest in several areas of mathematics, physics, and engineering, and has 
recently found important applications in algorithms, optimization and quantum complexity. Roughly, 
these latter applications are manifestations of the matrix-multiplicative weight update method and 
its deployment to solve semi-definite programs efficiently (see [HEIIS])- For fast graph algorithms, 
the quantity of interest is e~ L v, where L is the combinatorial Laplacian of a graph, and v is a 



(N 

in 
Q 

& ' vector. The vector e v can also be interpreted as the resulting distribution of a certain continuous- 
time random walk on the graph with starting distribution v. In [6], appealing to techniques from 
approximation theory, the computation of e~ L v was reduced to a small number of computations of 
the form L~ 1 u. Thus, using the near-linear-time Laplacian solved due to Spielman and Teng [7], 
this gives an 0(m)-time algorithm for approximating e~ L v for graphs with m edges. The question of 
whether the Spielman- Teng result is necessary in order to compute e~ L v in near-linear time remained 
open, see [91 Chapter 9]. We answer this question in the affirmative by presenting a reduction in the 
other direction, again relying on analytical techniques. The following is our main result. 

Theorem 1.1. Given e, 5 € (0,1], there exist poly(log l /&e) numbers < Wj,tj = 0(poly(Y<5e)), such 
that for all symmetric matrices A satisfying SI X A X /, (1 — e)A~ 1 X Wje~ tiA X (1 + e)A~ l . 

This proves that the problems of matrix exponentiation and matrix inversion are equivalent up to 
polylogarithmic factors. This result justifies the somewhat surprising use of Laplacian solvers for 
matrix-exponential based methods for designing fast semi-definite programming based algorithms for 
certain graph problems. Note that this equivalence does not require the matrix A to be a Laplacian, 
but only that it be a symmetric positive-definite matrix. It would be interesting to investigate if this 
result can be used to construct fast solvers for linear systems more general than those arising from 
graph Laplacians. Finally, note that the numbers Wj,tj in the above theorem are independent of the 
matrix A, and are given explicitly in the proof. 

The proof of Theorem 11.11 follows from the lemma below, which gives such an approximation in 
the scalar world. 

Lemma 1.2. Given e,5 € (0,1], there exist poly (log 1 /&e) numbers < uij,tj = 0(poly( 1 /fe)), such 
that for all x G [5, 1], (1 - e)x~ 1 < £V WjfT** < (1 + e)x" 1 . 

Note that as x approaches from the right, x~ l is unbounded, where as e~ tx is bounded by 1 for any 
t > 0. This justifies the assumption that x € [5, 1]. Versions of this lemma were proved in [21 [3]. Our 
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^A Laplacian solver is an algorithm that (approximately) solves a given system of linear equations Lx = 6, where L 
is a graph Laplacian and 6 £ Im(L), i.e., it (approximately) computes L~ 1 b, see [9]. 
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proof is simple and self-contained. We attempt to make the analytical techniques used in the proofs 
accessible to a wider theory audience. We begin by showing how Lemma 11.21 implies Theorem 1 1.11 



Proof of Theorem 11.11 Let {Aj}j be the eigenvalues of A with corresponding eigenvectors 
Since A is symmetric and SI ^ A ^ /, we have Aj 6 [5, 1], for all i. Let Wj,tj > denote the numbers 
given by Lemma 11.21 for parameters e and 5. Thus, Lemma 11.21 implies that all i, (1 — e)A~ 1 < 
Ej«y« / A < (1 + e)Ar 1 . Note that if Aj is an eigenvalue of A, then Aj 1 is the corresponding 
eigenvalue of A" 1 and e~ ijAi is that of e~ ijj4 with the same eigenvector. Thus, multiplying the 
scalar inequalities by UiuJ and summing up, we obtain the matrix inequality (1 — e) J2i \~ l UiuJ -< 
Ej Wj Ei e'^UiuJ 1 (1 + e) Y,i K lu i u J- Hence > (! " e ) j4 ~ 1 ^ Ej Wje"*^ ^ (1 + e)^ 1 . 

1.1. Integral Representation, Discretization and Smoothness. The starting point of the proof 
of Lemma 11.21 is the easy integral identity x~ l = e~ xt dt. Thus, by discretizing this integral to a 
sum, the fact that one can approximate x _1 as a weighted sum of exponentials as claimed Lemma [1.2l 
is not surprising. The crux is to prove that this can be achieved using a sparse sum of exponentials. 
One way to discretize an integral to a sum is the so called trapezoidal rule. If g is the integrand, and 
[a, b] is the interval of integration, this rule approximates the integral J b g(t)dt by covering the area 
under g in the interval [a, b] using trapezoids of small width, say h, as follows: 
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■J a 



b h K-l 

g(t)dt « T^ h d ^ f ^.J2(9(a + jh)+g(a + (j + l)h)), 

3=0 



where K = f is an integer. The choice of h determines the discretization of the interval [a, b], and 
hence K, which is essentially the sparsity of the approximating sum. To apply this to the integral 
representation for x~ l , we have to first truncate the infinite integral J Q e~ xt dt to a large enough 
interval [0, b], and then bound the error in the trapezoidal rule. Recall that the error needs to be of 
the form 

x" 1 - | £\ (e~ xjh + e - x( -i + V h ) < ex- 1 . 

For such an error guarantee to hold, we must have xh < O e {\). Thus, if we want the approximation 
to hold for all < x < 1, we require h < O e (l), which in turn implies that K > Q E (b). Also, if we 
restrict the interval to [0, b], the truncation error is e~ xt dt = x~ 1 e~ bx , forcing b > J" 1 log l /e for 
this error to be at most £ /x for all x € [5, 1]. Thus, this way of discretizing can only give us a sum 
which uses poly( 1 /<s) exponentials, which does not suffice for our application. 

This suggests that we should pick a discretization such that t, instead of increasing linearly with 
h, increases much more rapidly. Thus, a natural idea is to allow t to grow geometrically. This can 
be achieved by substituting t = e s in the above integral to obtain the identity x~ l = e~ xe +s ds. 
We show that discretizing this integral using the trapezoidal rule does indeed give us the lemma. 

For convenience, we define f x {s) = f e~ xe " +s . First, observe that f x {s) = x~ l ■ fi(s + lnx). Since 
we also allow the error to scale as as x varies over [5, 1], s needs to change only by an additive 
log Y<5 to compensate for x. Roughly, this suggests that when approximating this integral by the 
trapezoidal rule, the dependence on 1 /s is likely logarithmic, instead of polynomial. The proof 
formalizes this intuition and uses the fact that the error in the approximation by the trapezoidal 
rule can be expressed using the Euler-Maclaurin formula (see Section 12. It) which involves higher 
order derivatives of f x . We establish the following properties about the derivatives of f x which, when 
combined with known estimates on Bernoulli numbers obtained from the Riemann zeta function, 
allow us to bound this error with relative ease (see Section f 2 . 2 p : (1) All the derivatives of f x up to 
any fixed order, vanish at the end points of the integration interval (in the limit). (2) The derivatives 
of f x are reasonably smooth] the L\ norm of the k-th. derivative is bounded roughly by x~ l k k (see 
Lemma ll.4p . In summary, this allows us to approximate x~ l as an infinite sum of exponentials. In 
this sum, the contribution beyond about poly(log terms turns out to be negligible, and hence 
we can truncate the infinite sum to obtain our final approximation (see Section f2.3|) . 
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We now present simple properties of the derivatives of f x , alluded to above, which underlie the 
technical intuition as to why an approximation of the kind claimed in Lemma 11.21 should exist. Let 
fx k \s) denote the k th derivative of the function f x with respect to s. The first fact relates f x k \s) 
to f x (s). 

Fact 1.3. For any non-negative integer k, f x k \s) = fx{s)^ k =QCkj{—xe s y, where Ckj are some 
non-negative integers satisfying X^^=o Cfc J — + 



Proof We prove this lemma by induction on k. For k = 0, we have /i (s) = f x (s). Hence, f x v> is 
of the required form, with co,o = 1 5 and X^j=o c o,j = 1- Hence, the claim holds for k = 0. Suppose 
the claim holds for k. Hence, fx (s) = fx(s)^2 k =0 Ckj(—xe s y, where c^j are non-negative integers 
satisfying Y^ k j=o c k,j < {k + l) h+1 . We can compute fi k+1 \s) as follows, 



f(0) 



fi k+1) (s) 



k \ fc 

J2 c k,j(-xe s ) j f x (s) =^2c kJ {j - xe s + l){-xe s ) j f x {s) 

fx(s) + l)cfe,j + c fcj -i)(-xe s ) J , 

j=0 



= f 0, and Ck.-i = f 0. Thus, if we define Ck+ij = f (i + f)ck,j + Ckj~i, we 



where we define c^+i 

get that Ck+ij > 0, and that is of the required form. Moreover, we get, Sj=o c k+\ ,j < 

(jfe + 2)(jfe + l) fc+1 + (k + l) k+1 = {k + 3)(jfe + l)(jfe + l) fe < (Jfe + 2) 2 (A; + l) fc < (k + 2) k+2 . This proves 
the claim for k + 1 and, hence, the fact follows by induction. ■ 



The next lemma uses the fact above to bound the L\ norm of /. 



(fc) 



Lemma 1.4. For every non-negative integer k, f 



CO 



/i fc) (*; 



d S < 2 . e *(jfc + 1 



Proo/. By Fact O /_ 



DC 



(is is at most 



f 

j — i 



Fact[L3] 1 , , 

-(fc + l) fc+1 



s ds 



t=xe 



< 



l dt + 




-ty 



(k + l) 



k+l 



[1 + kl) < --e k {k + l) 



2k 



where the last inequality uses k + 1 < e k , and 1 + k\ < 2(k + l) fc . ■ 

We conclude this section by giving a brief comparison of our proof to that from [2]. While the 
authors in [2] employ both the trapezoidal rule and the Euler-Maclaurin formula, our proof strategy 
is different and leads to a shorter and simpler proof. In contrast to the previous proof, we use the 
Euler-Maclaurin formula in the limit over [—00,00], and since the derivatives of f x vanish in the 
limit, we save considerable effort in bounding the derivatives at the end points of the integral, which 
is required when using the Euler-Maclaurin formula to bound the error. We manage to use simpler 
bounds, at the cost of slightly worse parameters. On the way, we obtain an approximation of x~ l as 
an infinite sum of exponentials that holds for all x > 0, which we believe is interesting in itself. 

2. Proof of Lemma [TT21 

Before we introduce the Euler-Maclaurin formula which captures the error in the approximation 
of an integral by the trapezoidal rule, we introduce the Bernoulli numbers and polynomials, bounds 
on which are derived using a connection to the Riemann zeta function. 
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2.1. Bernoulli Polynomials and Euler-Maclaurin Formula. The Bernoulli numbers, denoted 
by bk for any integer k > 0, are a sequence of rational numbers which, while discovered in an 
attempt to compute sums of the form Ya>q S > nave deep connections to several areas of mathematics, 
including number theory and analysis!! They can be defined recursively as: &o = lj an d the following 
equation which is satisfied for all positive integers k > 2, Sj=o = ^- This implies that (e* — 

1) SfeLo^i! = Further, it can be checked that | + ^rrx is an even function, thus implying that 
bk = for odd fc > 2. Given the Bernoulli numbers, the Bernoulli polynomials are defined as 

B k {s) = f Y!j=o ^) b j sk ~ j - ft follows from the definition that, for all k, and j < k, = B frJ$ . 

We also get Bq(s) = 1, Bi(s) = s — i. Moreover, using the definition of Bernoulli numbers, we 
get that Bk(0) = -£>&(!) = bk for all k > 2. We also need the following bounds on the Bernoulli 
polynomials and the Bernoulli numbers. 

Lemma 2.1. For any non-negative integer k, and for all s € [0, 1], ^|fc)t^ — 7^1 — 



(2tt)2 



Proof. The first inequality follows from a well-known fact that |i?2fc(s)| < |&2fc| for ah s G [0, 1] (see 
[4]). For the second inequality, we recall the following connection between Bernoulli numbers and the 

Riemann zeta function for any even positive integer, proved by Euler (see [3]), C(2&) = f Y2j>i J~ 2?c = 

V X ^ 2-(2fe)! ' ±nUb ' (2fc)! ~ (27r)2fc 2^j>l J ^ ^vr; • ■ 

One of the most significant connections in analysis involving the Bernoulli numbers is the Euler- 
Maclaurin formula which describes the error in approximating an integral by the trapezoidal rule. 

Lemma 2.2 (Euler-Maclaurin Formula). Given a function g : R — > R, for any a < b, any positive 
h, and any positive integer N G N, we have, 

(1) 

f\{s)ds - 3}* = J* B2N ^- y [s]) g^\a + sh)ds - £ ^Lfc* - ^(a] 

where K = is an integer, and [■] denotes the integer part. 

Note that the Euler-Maclaurin formula is really a family of formulae, one each for the choice of N, 
which we call the order of the formula. Also note that this formula captures the error exactly. This 
error can be much less than the naive bound obtained by summing up the absolute value of the error 
due to each trapezoid. The first term in (PQ), after removing the contribution due to the Bernoulli 
polynomials via Lemma [2. H can be bounded by the L\ norm of g^ 2N ^ ■ The second term in (pQ) depends 
only on g( 2Ar_1 ) evaluated at the ends of the interval. The choice of N is influenced by how well 
behaved the higher order derivatives of the function are. For example, if g(s) is a polynomial, when 
2N > degree(<7), we get an exact expression for g(s)ds in terms of the values of the derivatives of 
g at a and b. 

In the next section, we use the Euler-Maclaurin formula to bound the error in approximating the 
integral J f x (s)ds using the trapezoidal rule. For our application, we pick a and b such that the 
derivatives up to order 2iV — 1 at a and b are negligible. Since the sparsity of the approximation is 
Q(yh), for the sparsity to depend logarithmically on the error parameter e, we need to pick N to be 
roughly ^(logi/e), so that the first error term in ([T]) is comparable to e. 

We end this section by giving a proof sketch for the Euler-Maclaurin formula (see also [8]). By a 
change of variables, it suffices to prove the formula for h = 1 and for the interval [0, 1]. Consider the 



2 The story goes that when Charles Babbage designed the Analytical Engine in the 19th century, one of the most 
important tasks he hoped the Engine would perform was the calculation of Bernoulli numbers. 
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integral B ^ N y S ^ g(s)ds, and apply integration by partial to it repeatedly to obtain 

,1 R (27V), 



2N [ ) a(s)ds = =m 

(2N)l 9[ ' (2N)\ 



-9{s, 



D 



(2N-2) 
2N 1 



B 2N (s) 
(2N)\ 



9 



(2N-1) 



(2N)\ 
1 



+ 



D. 



(2N-3) 
2N 



(22V) 



+ 



lB 2iv( 5 ) 5 ( 2 Ar) (s)ds 



o io (22V)! 



(fc) 

Using the fact that for all k < 2N, = ^ A^ff i an d rearranging, we get, 



' (2N)\ ~ (2N-k)\ 
1 2N 



B (s)g(s)ds - B 1 ( a )g(s) 



E( _ 1)fe - 1 



1 B 2 n(s) n{2N) 



+ 
-'o 



(2N)\ 



9^>(s)ds. 



fc=2 

Now, using Bq(s) = 1, we get that the first term on the l.h.s. is g(s)ds. Also, since B\(l) = 

1/2, -Bi(O) = — V 2 ; we see that the second term on the l.h.s. is 1/2 • (g(0) + g(l)) = tJ ' 1 ^' 1 . Finally, 
using -Bfc(O) = Sfc(l) = 6fc for k > 2, and that 6& = when k > 2 is odd, we get the desired formula. 



2.2. Approximation Using an Infinite Sum. As mentioned in Section [1.11 we approximate the 
integral f x (s)ds using the trapezoidal rule. We bound the error in this approximation using 
the Euler-Maclaurin formula. Since the Euler-Maclaurin formula applies to finite intervals, we first 
fix the step size h, use the Euler-Maclaurin formula to bound the error in the approximation over 
the interval [—bh,bh] (where b is some positive integer), and then let b go to 00. This allows us to 
approximate the integral over [— 00,00] by an infinite sum of exponentials. In the next section, we 
truncate this sum to obtain our final approximation. 

We are given e, 5 S (0, 1]. Fix an x G [5, 1], the step size h = ((log 1 /e)~ 2 ) , and the order of the 
Euler-Maclaurin formula, iV = O (log Ye) (exact parameters to be specified later). For any positive 
integer b, applying the order N Euler-Maclaurin formula to the integral f\ h f x (s)ds, and using 
bounds from Lemma 12. 11 we get, 



(2) 



6ft 



f x (s)ds - T 



-bh,bh],h 



bh 



fx 



<4 



h 

2^ 



2N 



N 



bh 



bh 



ds 



2ir J 



2/ 



f^H-bh) 



+ 



(2j-l) 



(bh) 



Now, we can use Fact ll.3l to bound the derivatives in the last term of ([2]). Fact II .31 implies that for any 
s and any positive integer k, < f x (s)(k + l) fc+1 max{l, (xe s ) k }. Thus, for b > — ^ log ^, we 

have xe~ bh < 1 and |/^(— bh)\ < e~ bh (k + l) fc+1 , and hence f( k \—bh) vanishes for any fixed k and 

h, as b goes to 00. Also, for any x > 0, and b > \ log |, we get, \f^ k \bh)\ < x k e^ k+V)bh ~ xebh (k + l) fc+1 , 
which again vanishes for any fixed k and h, as b goes to 00. Thus, letting b go to 00 and observing 



that T 



-bh,bh],h 



fx 



converges to hY,j & zfx{jh), © implies, 



(3) 



f x {s)ds - h^2f x (jh) 



< 4 



2vr J 



2N 



ds. 



Hence, since the derivatives of the function f x (s) vanish as s goes to ±00, the error in approximating 
the integral over [—00, 00] is just controlled by its smoothness. Since we already know f x is a very 



3 f &vds = uv- fu^ds. 

J as J ds 



smooth function, we are in good shape. Using Lemma 11.41 we get, {^) 2N f_ 



oo 
oo 



f? N \s) 



I ( {2N+ 2 f eh Y N ■ Thus, if we let h = e ^ +1 p , and N = \ \ log f ] , © implies that, 



ds < 



(4) 



/oo 
f x (s)ds-hJ2fx(jh) 



< 



-2N _ jj_ < £^ 

x — 3 x 



Also note that the above approximation holds for all x > 0. Thus, in particular, we can approximate 
the function x _1 over [5, 1] as an (infinite) sum of exponentials. 



2.3. Truncating the Infinite Sum and Proof of Lemma 11.21 Now, we want to truncate the 
infinite sum of exponentials approximating x _1 given by (j4j). Since the function f x (s) = e s ■ e~ xe is 

non-decreasing for s < log 1 /x, we majorize the lower tail by an integral. For A = f log ~ 
^ log j (since x < 1), 



< < 



(5) 



h Y e jh ■ e~ xeJh < h 



e jh . ( 



-xt 



df 



< 



el 
3 x 



Again, for the upper tail, since the function f x (s) = e s ■ e X£S is non-increasing for s > log^, we 
majorize by an integral. For B = |~ ^ log Q log f ) ] > ^ log ^ (since x > 5 and e < 1), 



(6) 



~ xejh < h 



Jh 



dj 



13 



-xt 



dt 



-xe Bh < £l 



3 x 



Before we complete the proof, we list here the setting of all parameters for completeness: 

2vr 



N 



1 , 24 

2 g T 



Thus, combining ([5]) and 



e 2 (2iV+l)2' A 
>]), the final error is given by, 
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_-l log 










1)] 




e 











- - h V e> h ■ e~ 3 

x 

j>A 



jh 



< 



j<A 



j>B 



Hence, (1 — e)x 1 < X^j>A ' ie? '' 1 • e xeJ < (1 + e)x 1 , implying the claim of Lemma 11.21 
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